Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition
نویسنده
چکیده
In this paper, we present a framework of a factored deep convolutional neural network (CNN) learning for noise robust automatic speech recognition (ASR). Deep CNN architecture, which has attracted great attention in various research areas, has also been successfully applied to ASR. However, to ensure noise robustness, since merely introducing deep CNN architecture into the acoustic modeling of ASR is insufficient, we introduce factored network architecture into deep CNN-based acoustic modeling. The proposed factored deep CNN framework factors out feature enhancement, delta parameter learning, and hidden Markov model state classification into three specific network blocks. By assigning specific roles to each block, the noise robustness of deep CNN-based acoustic models can be improved. With various comparative evaluations, we reveal that the proposed method successfully improves ASR accuracies in noise environments.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملEstimation of Hand Skeletal Postures by Using Deep Convolutional Neural Networks
Hand posture estimation attracts researchers because of its many applications. Hand posture recognition systems simulate the hand postures by using mathematical algorithms. Convolutional neural networks have provided the best results in the hand posture recognition so far. In this paper, we propose a new method to estimate the hand skeletal posture by using deep convolutional neural networks. T...
متن کاملEvaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
Deep Neural Network (DNN) based acoustic models have shown significant improvement over their Gaussian Mixture Model (GMM) counterparts in the last few years. While several studies exist that evaluate the performance of GMM systems under noisy and channel degraded conditions, noise robustness studies on DNN systems have been far fewer. In this work we present a study exploring both conventional...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017